-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix SSP restoring in edge cases #104820
Fix SSP restoring in edge cases #104820
Conversation
There are edge cases when the SSP restoring for continuation after a catch handler completes doesn't work correctly. The problem is caused by the fact that we scan for the Rip of the frame handling the exception on the shadow stack to find where to restore it, and in those edge cases, the same address can be there multiple times and the first occurence is not the right one. For example, when an exception is thrown from a catch handler, it escapes the handler and the handler for the escaped exception is in the same method as the one that invoked the handler. This change fixes it by finding the SSP of the first managed frame where we search for the handler and then updating the SSP with every unwind. The SSP is stored in the REGDISPLAY. So when we reach the CallCatchFunclet, the REGDISPLAY contains the SSP to restore. There was also one more issue with restoring the SSP. I turned out that the incsspq instruction uses only the lowest 8 bits of the argument to increment the SSP, so the ClrRestoreNonVolatileContextWorker needs to have a loop that repeats that instruction in case we need to move it by more than 255 slots.
// The float updating unwinds the stack so the pRD->pCurrentContext->Rip contains correct unwound Rip | ||
// This is used for exception handling and the Rip extracted from m_pCallerReturnAddress is slightly | ||
// off, which causes problem with searching for the return address on shadow stack on x64, so | ||
// we keep the value from the unwind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it a subtle bug that the Rip was off, or we just did not care about the correct IP in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually expected due to the way how the InlinedCallFrame is created in some cases. Neither the GC stack walk nor the new EH cared, but it caused problem with the newly added search for the first managed frame, as it sometimes is one that called the pinvoke (QCALL).
Do we have test coverage for these corner cases? |
There is no way to really test those, as Windows would happily run and silently fixup things. |
Also testing the topmost SSP against what we can figure from SP is rarely helpful either. The top item is often correct, while we did not pop the things properly and the SSP keeps growing, for example. There is some sensitivity to wrong SSP in hijacking as OS unhijacks using SSP and we use our own stashed value (but we assert that it works the same), so incorrect SSP may lead to asserts. However observing this requires a fairly tight race with suspension. |
Maybe there is a way to disable the OS fixup behavior, but then, I'd not be surprised something beyond our control requires it and nothing will run in such mode, even if it was possible to set up. |
Update_Loop: | ||
cmp r11, rax | ||
cmovb rax, r11 | ||
incsspq rax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed documented as only bits 0-7 are used, regardless of operand size.
Perhaps there is a way to save a byte of assembly by doing INCSSPD eax
:-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The incsspd increments the SSP by multiples of 4, not by 8 :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also make it use bits 0-3, because why not ... :-)
There is not, I've explicitly asked Windows folks about it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Is it possible for the shadow stack to overflow before this fix - can we use that to build regression test? Or can the effects of this fix be observed by better performance? If there is really no way to observe the effects of this fix, I am wondering why it is needed. If it is the case, can we simplify things by leaving it to the OS to fix things up as necessary? |
yes, if we pop SSP to the fist expected IP match, but we need to pop to some other match, the overflow could be the ultimate result. After enough iterations. It could take many iterations though and very special crafted repro. |
We should create the special repro. I would expect that it should be a viable outer loop test at least. |
It is possible that we already have tests when ssp grows, but we do not see that since the tests do not run long enough. I thought about some kind of asserting that shadow stack is roughly the same depth as the regular (i.e. like not twice deep), and use in some random places, but not sure how that can be done in practice, considering native frames on stack. |
I will try to create a repro that would result in stack overflow without this change. |
@jkotas I have added a test that fails with (shadow) stack overflow before this fix and passes after. |
Just curious - is there a special status code for shadow stack overflow or does it get reported as regular stack overflow? |
It gets reported as regular stack overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for creating the test!
BTW, The test also demonstrates that the fixup performed by Windows is not always correct. It is a good and perhaps the only possible backward-compat heuristic - pop to the nearest shadow stack item that matches the return site. However, the site may be recorded on the shadow stack multiple times and the code may be returning not to the lowermost occurrence. |
I think that it will eventually fix itself after a couple of returns - one return might pick a wrong location, but then a next return will be wrong again, so yet another fixup will be made that finally corrects things. |
There are edge cases when the SSP restoring for continuation after a catch handler completes doesn't work correctly. The problem is caused by the fact that we scan for the Rip of the frame handling the exception on the shadow stack to find where to restore it, and in those edge cases, the same address can be there multiple times and the first occurence is not the right one. For example, when an exception is thrown from a catch handler, it escapes the handler and the handler for the escaped exception is in the same method as the one that invoked the handler.
This change fixes it by finding the SSP of the first managed frame where we search for the handler and then updating the SSP with every unwind. The SSP is stored in the REGDISPLAY. So when we reach the CallCatchFunclet, the REGDISPLAY contains the SSP to restore.
There was also one more issue with restoring the SSP. I turned out that the incsspq instruction uses only the lowest 8 bits of the argument to increment the SSP, so the ClrRestoreNonVolatileContextWorker needs to have a loop that repeats that instruction in case we need to move it by more than 255 slots.